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Abstract 

The second eigenvalue of the Laplacian matrix and its associated eigenvector are fundamen- 
tal features of an undirected graph, and as such they have found widespread use in scientific 
computing, machine learning, and data analysis. In many applications, however, graphs that 
arise have several local regions of interest, and the second eigenvector will typically fail to pro- 
vide information fine-tuned to each local region. In this paper, we introduce a locally-biased 
analogue of the second eigenvector, and we demonstrate its usefulness at highlighting local 
properties of data graphs in a semi-supervised manner. To do so, we first view the second 
eigenvector as the solution to a constrained optimization problem, and we incorporate the 
local information as an additional constraint; we then characterize the optimal solution to 
this new problem and show that it can be interpreted as a generalization of a Personalized 
PageRank vector; and finally, as a consequence, we show that the solution can be computed 
in nearly-linear time. In addition, we show that this locally-biased vector can be used to 
compute an approximation to the best partition near an input seed set in a manner analo- 
gous to the way in which the second eigenvector of the Laplacian can be used to obtain an 
approximation to the best partition in the entire input graph. Such a primitive is useful for 
identifying and refining clusters locally, as it allows us to focus on a local region of interest in 
a semi-supervised manner. Finally, we provide a detailed empirical evaluation of our method 
by showing how it can applied to finding locally-biased sparse cuts around an input vertex 
seed set in social and information networks. 



1 Introduction 

Spectral methods are popular in machine learning, data analysis, and applied mathematics due 
to their strong underlying theory and their good performance in a wide range of applications. 
In the study of undirected graphs, in particular, spectral techniques play an important role, 
as many fundamental structural properties of a graph depend directly on spectral quantities 
associated with matrices representing the graph. Two fundamental objects of study in this area 
are the second smallest eigenvalue of the graph Laplacian and its associated eigenvector. These 
quantities determine many features of the graph, including the behavior of random walks and the 
presence of sparse cuts. This relationship between the graph structure and an easily-computable 
quantity has been exploited in data clustering, community detection, image segmentation, parallel 
computing, and many other applications. 
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A potential drawback of using the second eigenvalue and its associated eigenvector is that they 
are inherently global quantities, and thus they may not be sensitive to very local information. 
For instance, a sparse cut in a graph may be poorly correlated with the second eigenvector (and 
even with all the eigenvectors of the Laplacian) and thus invisible to a method based only on 
eigenvector analysis. Similarly, based on domain knowledge one might have information about a 
specific target region in the graph, in which case one might be interested in finding clusters only 
near this prespecified local region, e.g., in a semi-supervised manner; but this local region might 
be essentially invisible to a method that uses only global eigenvectors. For these and related 
reasons, standard global spectral techniques can have substantial difficulties in semi-supervised 
settings, where the goal is to learn more about a locally-biased target region of the graph. 

In this paper, we provide a methodology to construct a locally-biased analogue of the second 
eigenvalue and its associated eigenvector, and we demonstrate both theoretically and empirically 
that this localized vector inherits many of the good properties of the global second eigenvector. 
Our approach is inspired by viewing the second eigenvector as the optimum of a constrained 
global quadratic optimization program. To model the localization step, we modify this program 
by adding a natural locality constraint. This locality constraint requires that any feasible solution 
have sufficient correlation with the target region, which we assume is given as input in the form 
of a set of nodes or a distribution over vertices. The resulting optimization problem, which we 
name LocalSpectral and which is displayed in Figure [TJ is the main object of our work. 

The main advantage of our formulation is that an optimal solution to LocalSpectral captures 
many of the same structural properties as the global eigenvector, except in a locally-biased setting. 
For example, as with the global optimization program, our locally-biased optimization program 
has an intuitive geometric interpretation. Similarly, as with the global eigenvector, an optimal 
solution to LocalSpectral is efficiently computable. To show this, we characterize the optimal 
solutions of LocalSpectral and show that such a solution can be constructed in nearly-linear time 
by solving a system of linear equations. In applications where the eigenvectors of the graph are pre- 
computed and only a small number of them are needed to describe the data, the optimal solution 
to our program can be obtained by performing a small number of inner product computations. 
Finally, the optimal solution to LocalSpectral can be used to derive bounds on the mixing time 
of random walks that start near the local target region as well as on the existence of sparse cuts 
near the locally-biased target region. In particular, it lower bounds the conductance of cuts as 
a function of how well-correlated they are with the seed vector. This will allow us to exploit 
the analogy between global eigenvectors and our localized analogue to design an algorithm for 
discovering sparse cuts near an input seed set of vertices. 

In order to illustrate the empirical behavior of our method, we will describe its performance 
on the problem of finding locally-biased sparse cuts in real data graphs. Subsequent to the 
dissemination of the initial technical report version of this paper, our methodology was applied 
to the problem of finding, given a small number of "ground truth" labels that correspond to 
known segments in an image, the segments in which those labels reside [23]. This computer 
vision application will be discussed briefly. Then, we will describe in detail how our algorithm 
for discovering sparse cuts near an input seed set of vertices may be applied to the problem of 
exploring data graphs locally and to identifying locally-biased clusters and communities in a more 
difficult-to-visualize social network application. In addition to illustrating the performance of the 
method in a practical application related to the one that initially motivated this work |2U| ,l2H l22]. 
this social graph application will illustrate how the various "knobs" of our method can be used 
in practice to explore the structure of data graphs in a locally-biased manner. 

Recent theoretical work has focused on using spectral ideas to find good clusters nearby an 
input seed set of nodes |30|, [TJ [10] . These methods are based on running a number of local random 
walks around the seed set and using the resulting distributions to extract information about 



2 



clusters in the graph. Recent empirical work has used Personalized PageRank, a particular variant 
of a local random walk, to characterize very finely the clustering and community structure in a 
wide range of very large social and information networks [2J [20} |2"T| 122] . In contrast with previous 
methods, our local spectral method is the first to be derived in a direct way from an explicit 
optimization problem inspired by the global spectral problem. Interestingly, our characterization 
also shows that optimal solutions to LocalSpectral are generalizations of Personalized PageRank, 
providing an additional insight to why local random walk methods work well in practice. 

In the next section, we will describe relevant background and notation; and then, in Section [3j 
we will present our formulation of a locally-biased spectral optimization program, the solution 
of which will provide a locally-biased analogue of the second eigenvector of the graph Laplacian. 
Then, in Section [4] we will describe how our method may be applied to identifying and refining 
locally-biased partitions in a graph; and in Section[5]we will provide a detailed empirical evaluation 
of our algorithm. Finally, in Section [6j we will conclude with a discussion of our results in a 
broader context. 



2 Background and Notation. 

Let G = (V,E,w) be a connected undirected graph with n = \V\ vertices and m = \E\ edges, 
in which edge {i,j} has weight W{j. For a set of vertices S C V in a graph, the volume of S is 

vol(S') = f ^ie5^*' m wn i cn case the volume of the graph G is vol(G) = f vol(V) = 2m. In the 
following, Aq E l$y xV will denote the adjacency matrix of G, while Dq E My xV will denote the 
diagonal degree matrix of G, i.e., Da(i,i) = &% = j}eE w iji the weighted degree of vertex i. 

The Laplacian of G is defined as Lq = f Dq — Aq. (This is also called the combinatorial Laplacian, 
in which case the normalized Laplacian of G is Lq = f D G 1 ^ 2 LqDq 1 ^ 2 .) 

The Laplacian is the symmetric matrix having quadratic form x t Lqx = YlijeE w ij( x i ~ x j) 2 > 
for x E M. v . This implies that Lq is positive semidefinite and that the all-one vector 1 E M. v is 
the eigenvector corresponding to the smallest eigenvalue 0. For a symmetric matrix A, we will 
use A y to denote that it is positive semi-definite. Moreover, given two symmetric matrices 
A and B, the expression A y B will mean A — B y 0. Further, for two n x n matrices A and 
B, we let A o B denote Tr (A T B). Finally, for a matrix A, let A + denote its (uniquely defined) 
Moore-Penrose pseudoinverse. 

For two vectors x,y E M n , and the degree matrix Dq for a graph G, we define the degree- 
weighted inner product as x T Dqh = f YA=i x iUidi- Given a subset of vertices S C V, we denote 
by Is the indicator vector of S in MY and by 1 the vector in M. v having all entries set equal to 
1. We consider the following definition of the complete graph K n on the vertex set V: Ak„ = f 
vq ^ g ^ Dq11 t Dq. Note that this is not the standard complete graph, but a weighted version of it, 
where the weights depend on Dq. With this scaling we have Dk„ = Dq. Hence, the Laplacian 
of the complete graph defined in this manner becomes Ljc n = Dq — vo i(g) Dq\\ t Dq. 

In this paper, the conductance <j>(S) of a cut (S, S) is 4>(S) = f vol(G) • ^j^^^J gj ■ A sparse 
cut, also called a good-conductance partition, is one for which (f>(S) is small. The conductance of 
the graph G is then 4>(G) = mingcy <P(S). Note that the conductance of a set S, or equivalently 
a cut (S,S), is often defined as (j)'(S) = \E(S,S)\/ min{vol(5), vol(5)}. This notion is equivalent 
to that (f>(S), in that the value 4>(G) thereby obtained for the conductance of the graph G differs 
by no more than a factor of 2 times the constant vol(G), depending on which notion we use for 
the conductance of a set. 
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min x Lqx 



min x T Lqx 



s.t. x t Dqx = 1 



s.t. x T D G x = 1 

x T Z> G l 2 = 

v Dgs) > K 

x G M . . 

x G R 

Figure 1: Global and local spectral optimization programs. Left: The usual spectral program 
Spectral(G). Right: Our new locally-biased spectral program LocalSpectral(G, s, k). In both cases, 
the optimization variable is the vector x G W 1 . 



3 The LocalSpectral Optimization Program 

In this section, we introduce the local spectral optimization program LocalSpectral (G, s, k) as 
a strengthening of the usual global spectral program Spectra 1(G). To do so, we will augment 
Spectral (G) with a locality constraint of the form (x T Dqs) 2 > k, for a seed vector s and a corre- 
lation parameter n. Both these programs are homogeneous quadratic programs, with optimization 
variable the vector x G MY , and thus any solution vector x is essentially equivalent to — x for the 
purpose of these optimizations. Hence, in the following we do not differentiate between x and 
— x, and we assume a suitable direction is chosen in each instance. 



3.1 Motivation for the Program 

Recall that the second eigenvalue A 2 (G) of the Laplacian Lq can be viewed as the optimum of 
the standard optimization problem Spectral(G) described in Figure [T] In matrix terminology, the 
corresponding optimal solution V2 is a generalized eigenvector of Lq with respect to Dq. For our 
purposes, however, it is best to consider the geometric meaning of this optimization formulation. 
To do so, suppose we are operating in a vector space M. v , where the ith dimension is stretched 
by a factor of di, so that the natural identity operator is Dq and the inner product between 
two vectors x and y is given by Yl,ie.v ^i x iVi = x t Dqv.. In this representation, Spectral(G) is 
seeking the vector x G MX that is orthogonal to the all-one vector, lies on the unit sphere, and 
minimizes the Laplacian quadratic form. Note that such an optimum v-i may lie anywhere on the 
unit sphere. 

Our goal here is to modify Spectral(G) to incorporate a bias towards a target region which 
we assume is given to us as an input vector s. We will assume (without loss of generality) 
that s is properly normalized and orthogonalized so that s T Dqs = 1 and s t Dq1 = 0. While 
s can be a general unit vector orthogonal to 1, it may be helpful to think of s as the indicator 
vector of one or more vertices in V, corresponding to the target region of the graph. We obtain 
LocalSpectral(G, s, k) from Spectral(G) by requiring that a feasible solution also have a sufficiently 
large correlation with the vector s. This is achieved by the addition of the constraint (x T Dqs) 2 > 
k, which ensures that the projection of x onto the direction s is at least y/n in absolute value, 
where the parameter k is also an input parameter ranging between and 1. Thus, we would like 
the solution to be well-connected with or to lie near the seed vector s. In particular, as displayed 
pictorially in Figure [2j x must lie within the spherical cap centered at s that contains all vectors at 
an angle of at most arccos(y / K) from s. Thus, higher values of k demand a higher correlation with s 
and, hence, a stronger localization. Note that in the limit K = 0, the spherical cap constituting the 
feasible region of the program is guaranteed to include V2 and LocalSpectral(G, s, k) is equivalent to 
Spectral(G). In the rest of this paper, we refer to s as the seed vector and to k as the correlation 
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parameter for a given LocalSpectral(G, s, k) optimization problem. Moreover, we denote the 
objective value of the program LocalSpectral(G, s, k) by the number X(G, s, k). 




Figure 2: (Best seen in color.) Pictorial representation of the feasible regions of the optimization 
programs Spectral(G) and LocalSpectral(G, s, k) that are defined in Figure [Tj See the text for a 
discussion. 



3.2 Characterization of the Optimal Solutions of LocalSpectral 

Our first theorem is a characterization of the optimal solutions of LocalSpectral. Although Lo- 
calSpectral is a non-convex program (as, of course, is Spectral), the following theorem states that 
solutions to it can be expressed as the solution to a system of linear equations which has a nat- 



ural interpretation. The proof of this theorem (which may be found in Section 3.4) will involve 
a relaxation of the non-convex program LocalSpectral to a convex semidefinite program (SDP), 
i.e., the variables in the optimization program will be distributions over vectors rather than the 
vectors themselves. For the statement of this theorem, recall that A + denotes the (uniquely 
defined) Moore-Penrose pseudoinverse of the matrix A. 

Theorem 1 (Solution Characterization) Let s £ M. v be a seed vector such that s T D G l = 0, 
s t Dgs = 1, and s t DgV2 ^ 0, where V2 is the second generalized eigenvector of L G with respect 
to Dq. In addition, let 1 > k > be a correlation parameter, and let x* be an optimal solution 
to LocalSpectral(G, s, k). Then, there exists some 7 £ (— oo,A2(G)) and a c£ [0, 00] such that 

x* = c(L G - 1 D G )+D G s. (1) 

There are several parameters (such as s, k, 7, and c) in the statement of Theorem [TJ and un- 
derstanding their relationship is important: s and n are the parameters of the program; c is a 
normalization factor that rescales the norm of the solution vector to be 1 (and that can be com- 
puted in linear time, given the solution vector); and 7 is implicitly defined by k, G, and s. The 
correct setting of 7 ensures that {s T Dqx*) 2 = k, i.e., that x* is found exactly on the boundary of 
the feasible region. At this point, it is important to notice the behavior of x* and 7 as k changes. 
As k goes to 1, 7 tends to —00 and x* approaches s; conversely, as k goes to 0, 7 goes to A2(G) 
and x* tends towards V2, the global eigenvector. We will discuss how to compute 7 and x*, given 



a specific k, in Section 3.3 



Finally, we should note that there is a close connection between the solution vector x* and 
the popular PageRank procedure. Recall that PageRank refers to a method to determine a global 
rank or global notion of importance for a node in a graph such as the web that is based on the 
link structure of the graph [U HHJ [5] . There have been several extensions to the basic PageRank 
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concept, including Topic-Sensitive PageRank [2] and Personalized PageRank [15] . In the same 
way that PageRank can be viewed as a way to express the quality of a web page over the entire 
web, Personalized PageRank expresses a link-based measure of page quality around user-selected 
pages. In particular, given a vector s 6 M. v and a teleportation constant a > 0, the Personalized 
PageRank vector can be written as pr Q s = [Lq + ^-^Dq) Dqs pQ. By setting 7 = — 
the optimal solution to LocalSpectral is proved to be a generalization of Personalized PageRank. 
In particular, this means that for high values of the correlation parameter k, for which the 
corresponding 7 in Theorem [T] is negative, the optimal solution to LocalSpectral takes the form 
of a Personalized PageRank vector. On the other hand, when 7 > 0, the optimal solution to 
LocalSpectral provides a smooth way of transitioning from the Personalized PageRank vector to 
the global second eigenvector V2- 

3.3 Computation of the Optimal Solutions of LocalSpectral 

In this section, we discuss how to compute efficiently an optimal solution for LocalSpectral(G, s, k), 
for a fixed choice of the parameters G, s, and k. The following theorem is our main result. 

Theorem 2 (Solution Computation) For any e > 0, a solution to LocalSpectral(G, s, k) of 

value at most (1+e) • A(G, s, k) can be computed in time 0( m /- v /A 2 (G) dog( 1 /e)) using the Conjugate 
Gradient Method jllSf . Alternatively, such a solution can be computed in time 0(m log( 1 /e)) using 
the Spielman-Teng linear- equation solver \30^ . 

Proof: By Theorem [TJ we know that the optimal solution x* must be a unit-scaled version of 
2/(7) = (Lq — jDg) + Dqs, for an appropriate choice of 7 G (— oo,A2(G)). Notice that, given 
a fixed 7, the task of computing 2/(7) is equivalent to solving the system of linear equations 
{Lq — r yDc)y = Dqs for the unknown y. This operation can be performed, up to accuracy e, in 
time 0( m /- v /A 2 (G) • log^/e)) using the Conjugate Gradient Method, or in time 0(mlog( 1 /e)) using 
the Spielman-Teng linear-equation solver. To find the correct setting of 7, it suffices to perform 
a binary search over the possible values of 7 in the interval (— vol(Gr), A2(G)), until (s t Dgx) 2 is 
sufficiently close to k. 

o 

We should note that, depending on the application, other methods of computing a solution to 
LocalSpectral(G, s, k) might be more appropriate. In particular, if an eigenvector decomposition 
of Lq has been pre-computed, as is the case in certain machine learning and data analysis appli- 
cations, then this computation can be modified as follows. Given an eigenvector decomposition 
of Lq as Lq = Y17=2 ^i-DQUiujD^ 2 , then 2/(7) must take the form 

y( 7 ) = (L G - 7 Dg) + Dqs = f2 Y ^—(s T D 1 J 2 u) 2 , 

for the same choice of c and 7, as in Theorem [T] Hence, given the eigenvector decomposition, each 
guess 7/(7) of the binary search can be computed by expanding the above series, which requires 
a linear number of inner product computations. While this may yield a worse running time than 
Theorem [2] in the worst case, in the case that the graph is well-approximated by a small number 
k of dominant eigenvectors, then the computation is reduced to only k straightforward inner 
product computations. 

3.4 Proof of Theorem [Q 

We start with an outline of the proof. Although the program LocalSpectral(G, s, k) is not convex, 
it can be relaxed to the convex semidefinite program SDP P (G, s, K ) of Figure [31 Then, one 
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minimize L G o X maximize a + k/3 

s.t. L Kn oX = l s .t. L G h aL Kn + (3(D G s){D G s) T 

(D G s)(D G s) T o X > k (3>0 

X tO a G K 

Figure 3: Left: Primal SDP relaxation of LocalSpectral(G, s, re): SDP P (G, s, k); for this primal, 
the optimization variable is X G M l/xl/ such that X is symmetric and positive semidefinite. Right: 
Dual SDP relaxation of LocalSpectral(G, s, k): SDP c j(G, s, k); for this dual, the optimization 

variables are a, /3 G R. Recall that L^ n = — voi(G1 



can observe that strong duality holds for this SDP relaxation. Using strong duality and the 
related complementary slackness conditions, one can argue that the primal SDP p (G, s,k) has a 
rank one unique optimal solution under the conditions of the theorem. This implies that the 
optimal solution of SDP P (G, s,n) is the same as the optimal solution of LocalSpectral(G, s, k). 
Moreover, combining this fact with the complementary slackness condition obtained from the 
dual SDPd(£r, s, k) of Figure [3j one can derive that the optimal rank one solution is of the form 
promised by Theorem [T] 

Before proceeding with the details of the proof, we pause to make several points that should 
help to clarify our approach. 

• First, since it may seem to some readers to be unnecessarily complex to relax LocalSpectral 
as an SDP, we emphasize that the motivation for relaxing it in this way is that we would 
like to prove Theorem [TJ To prove this theorem, we must understand the form of the 
optimal solutions to the non-convex program LocalSpectral. Thus, in order to overcome the 
non-convexity, we relax LocalSpectral to SDP P (G, s,k) (of Figure [3]) by "lifting" the rank-1 
condition implicit in LocalSpectral. Then, strong duality applies; and it implies a set of 
sufficient optimality conditions. By combining these conditions, we will be able to establish 
that an optimal solution X* to SDP P (G, s, k) has rank 1, i.e., it has the form X* = x*x* T for 
some vector x*; and thus it yields an optimal solution to LocalSpectral, i.e., the vector x*. 

• Second, in general, the value of a relaxation like SDP P (G, s,k) may be strictly less than 
that of the original program (LocalSpectral, in this case). Our characterization and proof 
will imply that the relaxation is tight, i.e., that the optimum of SDP P (G, s,k) equals that 
of LocalSpectral. The reason is that one can find a rank-1 optimal solution to SDP p (G, s, k), 
which then yields an optimal solution of the same value for LocalSpectral. Note that this also 
implies that strong duality holds for the non-convex LocalSpectral, although this observation 
is not needed for our proof. 

That is, although it may be possible to prove Theorem [T] in some other way that does not involve 
SDPs, we chose this proof since it is simple and intuitive and correct; and we note that Appendix B 
in the textbook of Boyd and Vandenberghe [7] proves a similar statement by the same SDP-based 
approach. 

Returning to the details of the proof, we will proceed to prove the theorem by establishing 
a sequence of claims. First, consider SDP P (G, s,k) and its dual SDP^(G, s,k) (as shown in 
Figure hi). The following claim uses the fact that, given X = xx T for x G MX , and for any matrix 
A G R , we have that A o X = x T Ax. In particular, L G o X = x T L G x, for any graph G, and 
(x T D G s) 2 = x T D G ss T D G x = D G ss T D G o X. 
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Claim 1 The primal SDP P (G, s, k) is a relaxation of the vector program LocalSpectral(G, s, k). 

Proof: Consider a vector x that is a feasible solution to LocalSpectral(G, s, k), and note that 
X = xx T is a feasible solution to SDP P (G, s, k). 

o 

Next, we establish the strong duality of SDP p (G, s, k). (Note that the feasibility conditions and 
complementary slackness conditions stated below may not suffice to establish the optimality, in 
the absence of this claim; hence, without this claim, we could not prove the subsequent claims, 
which are needed to prove the theorem.) 

Claim 2 Strong duality holds between SDP P (G, s,k) and SDP^(G, s,k). 

Proof: Since SDP p (G, s,k) is convex, it suffices to verify that Slater's constraint qualification 
condition [7J is true for this primal SDP. Consider X = ss T . Then, (D G s)(D G s) T o ss t = 
(s T D G s) 2 = 1 > K. 

o 

Next, we use this result to establish the following two claims. In particular, strong duality allows 
us to prove the following claim showing the KKT-conditions, i.e., the feasibility conditions and 
complementary slackness conditions stated below, suffice to establish optimality. 

Claim 3 The following feasibility and complementary slackness conditions are sufficient for a 
primal-dual pair X* , a* , (3* to be an optimal solution. The feasibility conditions are: 



L Kn oX* = 1 (2) 

(D G s)(D G sf o X* > k (3) 

L G -a*L Kn -P*(D G s)(D G s) T h (4) 

P* > 0, (5) 

and the complementary slackness conditions are: 

a*(L Kn oX*-l) = (6) 

(3*((D g s)(D g s) t oX*-k) = (7) 

X*o(L G -a*L Kn -P*(D G s)(D G sf) = 0. (8) 



Proof: This follows from the convexity of SDP p (G, s, k) and Slater's condition [7]. 

o 

Claim 4 These feasibility and complementary slackness conditions, coupled with the assumptions 
of the theorem, imply that X* must be rank 1 and (3* > 0. 

Proof: Plugging in V2 in Equation Q, we obtain that v\h G V2 — a* — P*(v\D G s) 2 > 0. But 
V2L G V2 = A2(G) and j3* > 0. Hence, A2(G) > a*. Suppose a* = A2(G). As s T D G V2 / 0, it must 
be the case that (3* = 0. Hence, by Equation Q, we must have X* oL(G) = \2(G), which implies 
that X* = V2V2 ' , i.e., the optimum for LocalSpectral is the global eigenvector V2- This corresponds 
to a choice of 7 = A2(G) and c tending to infinity. 

Otherwise, we may assume that a* < \2(G). Hence, since G is connected and a* < A2(G), 
L G — a*Lx n has rank exactly n— 1 and kernel parallel to the vector 1. From the complementary 
slackness condition Q we can deduce that the image of X* is in the kernel of L G — a*Lx n — 
(3*(D G s)(D G s) T . If /3* > 0, we have that f3*(D G s)(D G s) T is a rank one matrix and, since s T D G l = 
0, it reduces the rank of L G — a*Lx n by one precisely. If j3* = then X* must be which is not 
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possible if SDP p (G, s, k) is feasible. Hence, the rank of Lq — a*Lj< n — (3*(Dgs)(Dgs) t must be 
exactly n — 2. As we may assume that 1 is in the kernel of X*, X* must be of rank one. This 
proves the claim. 

o 

Now we complete the proof of the theorem. From the claim it follows that, X* = x*x* T where x* 
satisfies the equation [Lq — a*LK n — (3*(Dgs)(Dgs) t )x* = 0. From the second complementary 
slackness condition, Equation ([7]), and the fact that f3* > 0, we obtain that (x*) t Dgs = ±-y//c. 
Thus, x* = ±I3*t/k(Lg — a*LK n ) + Dgs, as required. 



4 Application to Partitioning Graphs Locally 

In this section, we describe the application of LocalSpectral to finding locally-biased partitions in 
a graph, i.e., to finding sparse cuts around an input seed vertex set in the graph. For simplicity, 
in this part of the paper, we let the instance graph G be unweighted. 



4.1 Background on Global Spectral Algorithms for Partitioning Graphs 

We start with a brief review of global spectral graph partitioning. Recall that the basic global 
graph partitioning problem is: given as input a graph G = (V,E), find a set of nodes S C V to 
solve 

6(G) = mm 6(g). 
scv 

Spectral methods approximate the solution to this intractable global problem by solving the 
relaxed problem Spectral(G) presented in Figure [TJ To understand this optimization problem, 
recall that x t Lgx counts the number of edges crossing the cut and that x T Dqx = 1 encodes 
a variance constraint; thus, the goal of Spectra 1(G) is to minimize the number of edges crossing 
the cut subject to a given variance. Recall that for T C V, we let It £ {0, 1}^ be a vector 
which is 1 for vertices in T and otherwise. Then for a cut (S,S), if we define the vector 
def / vo \(S)-vo\(Sj ( Is lg 



vs = y v TO i(G) ' \ voifs) ~ ^as)' ^ can ^ e cnec ked that vs satisfies the constraints of Spectral 
and has objective value 6(S). Thus, A2(G) < mingcy 6(S) = 6(G). 

Hence, Spectral (G) is a relaxation of the minimum conductance problem. Moreover, this 
program is a good relaxation in that a good cut can be recovered by considering a truncation, i.e., 
a sweep cut, of the vector V2 that is the optimal solution to Spectral(G). (That is, e.g., consider 
each of the n cuts defined by the vector V2, and return the cut with minimum conductance value.) 
This is captured by the following celebrated result often referred to as Cheeger's Inequality. 

Theorem 3 (Cheeger's Inequality) For a connected graph G, 6(G) < 0(y/\2{G)). 

Although there are many proofs known for this theorem (see, e.g., [9]), a particularly interesting 
proof was found by Mihail |25j : this proof involves rounding any test vector (rather than just the 
optimal vector), and it achieves the same guarantee as Cheeger's Inequality. 

Theorem 4 (Sweep Cut Rounding) Let x be a vector such that x t Dq! = 0. Then there is 
a t for which the set of vertices S := SweepCut t (x) = f {i : X\ > t} satisfies ~r^j^ > 6 2 (S)/<&. 

It is the form of Cheeger's Inequality provided by Theorem [4] that we will use below. 
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4.2 Locally-Biased Spectral Graph Partitioning 



Here, we will exploit the analogy between Spectral and LocalSpectral by applying the global 
approach just outlined to the following locally-biased graph partitioning problem: given as input 
a graph G = (V, E), an input node u, and a positive integer k, find a set of nodes T C V achieving 

(b(u,k)= min 4>(T). 

TCV:«eT,vol(T)<fc 

That is, the problem is to find the best conductance set of nodes of volume no greater than k 
that contains the input node v. 

As a first step, we show that we can choose the seed set and correlation parameters s and k 
such that LocalSpectral(G, s, k) is a relaxation for this locally-biased graph partitioning problem. 



Lemma 1 For u £ V, LocalSpectral(G,V{ u y,l/k) is a relaxation of the problem of finding a 
minimum conductance cut T in G which contains the vertex u and is of volume at most k. In 
particular, X(G,V{ u y,l/k) < 4>(u,k). 



Proof: If we let x = vt in LocalSpectral(G, i>{ u }, l//c), then v^Lgvt = <I>(T), v^Dq! = 0, and 

d u (2m-vo\(T)) 
vol(T)(2m-d u ) 



v^Dgvt = 1- Moreover, we have that (v^Dgv^) 2 = vo/m(2m-P) — V^> which establishes the 



lemma. 

o 

Next, we can apply Theorem[4]to the optimal solution for LocalSpectral (G, V{ u y, l/k) and obtain a 
cut T whose conductance is quadratically close to the optimal value X(G, V{ u y, l/k). By LemmaJTJ 
this implies that </>(T) < 0(y/ '(f>(u, k)). This argument proves the following theorem. 

Theorem 5 (Finding a Cut) Given an unweighted graph G = (V,E), a vertex u £ V and a 
positive integer k, we can find a cut in G of conductance at most 0(\f <f>(u, k)) by computing a 
sweep cut of the optimal vector for LocalSpectral(G,V{ u y,l/k). Moreover, this algorithm runs in 
nearly-linear time in the size of the graph. 

That is, this theorem states that we can perform a sweet cut over the vector that is the solution 
to LocalSpectral(G, l/k) in order to obtain a locally-biased partition; and that this partition 
comes with quality-of-approximation guarantees analogous to that provided for the global problem 
Spectral (G) by Cheeger's inequality. 

Our final theorem shows that the optimal value of LocalSpectral also provides a lower bound 
on the conductance of other cuts, as a function of how well-correlated they are with the input 
seed vector. In particular, when the seed vector corresponds to a cut U, this result allows us to 
lower bound the conductance of an arbitrary cut T, in terms of the correlation between U and 
T. The proof of this theorem also uses in an essential manner the duality properties that were 
used in the proof of Theorem [T] 

Theorem 6 (Cut Improvement) Let G be a graph and s G M n be such that s T DqI = 0, where 
Dq is the degree matrix of G. In addition, let k > be a correlation parameter. Then, for all 
sets T C V such that k' = f (s T DqVt) 2 , we have that 



X(G, s, k) if k < k' 

k '/k ■ X(G, s, k) if k' < k. 



Proof: It follows from Theorem [T] that X(G, s, k) is the same as the optimal value of SDP p (G, s, k) 
which, by strong duality, is the same as the optimal value of SDP^G, s, k). Let a*, (3* be the 
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optimal dual values to SDP^G, s, re). Then, from the dual feasibility constraint Lq — a*Lx„ — 
l3*(D G s){D G s) T y 0, it follows that 

s^Lgst - a*s^L Kn s T - ^{s T D G s T ) 2 > 0. 

Notice that since s^DqI = 0, it follows that s^,Lx n ST = s^Dgst = 1- Further, since s^Lqst = 
4>(T), we obtain, if k < n', that 

<j)(T) >a* + P*(s T D G s T ) 2 >a* + /3*k = X(G, s, k). 

If on the other hand, k' < k, then 

<KT) >a* + p{s T D G s T ) 2 >a* + (3*k > »'/« • (a* + @*k) = «'/„ • X(G, s, k). 

Note that strong duality was used here. 

o 

Thus, although the relaxation guarantees of Lemma [T] only hold when the seed set is a single 
vertex, we can use Theorem [6] to consider the following problem: given a graph G and a cut 
(T, T) in the graph, find a cut of minimum conductance in G which is well-correlated with T or 
certify that there is none. Although one can imagine many applications of this primitive, the 
main application that motivated this work was to explore clusters nearby or around a given seed 
set of nodes in data graphs. This will be illustrated in our empirical evaluation in Section [5} 



4.3 Our Geometric Notion of Correlation Between Cuts 

Here we pause to make explicit the geometric notion of correlation between cuts (or partitions, or 
sets of nodes) that is used by LocalSpectral, and that has already been used in various guises in 
previous sections. Given a cut (T, T) in a graph G = (V,E), a natural vector in MX to associate 
with it is its characteristic vector, in which case the correlation between a cut (T, T) and another 
cut (U, U) can be captured by the inner product of the characteristic vectors of the two cuts. A 
somewhat more refined vector to associate with a cut is the vector obtained after removing from 
the characteristic vector its projection along the all-ones vector. In that case, again, a notion of 
correlation is related to the inner product of two such vectors for two cuts. More precisely, given 
a set of nodes T C V, or equivalently a cut (T, T), one can define the unit vector st as 



f v /vol(T)vol(f)/ 2m . l/vol(T) if i e T 

StW ~ \ -^/voi(r)voi(f)/ 2m . i/ vo i(T) if i e T. 



That is, st = f y voI ('^)™ I ( T ) (^r_ _ —^zl. \ ; which is exactly the vector defined in Section 4.1 
It is easy to check that this is well defined: one can replace st by Sf and the correlation remains 
the same with any other set. Moreover, several observations are immediate. First, defined this 
way, it immediately follows that s^Dg^ = and that s^Dqst = 1. Thus, st G 5d for T C V, 
where we denote by Sd the set of vectors {x G MY : x t Dg1 = 0}; and st can be seen as an 
appropriately normalized version of the vector consisting of the uniform distribution over T minus 
the uniform distribution over T. 1 Second, one can introduce the following measure of correlation 
between two sets of nodes, or equivalently between two cuts, say a cut (T, T) and a cut (U, U): 

K(T,U) = (s T D GS u) 2 . 



1 Notice also that st = —sj,. Thus, since we only consider quadratic functions of st, we can consider both st 
and s T to be representative vectors for the cut (T, T) . 
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The proofs of the following simple facts regarding K(T,U) are omitted: K(T,U) £ [0,1]; 
K(T,U) = 1 if and only if T = U or f = U; K(T,U) = K(T,U); and K(T,U) = K(T,U). 
Third, although we have described this notion of geometric correlation in terms of vectors of the 
form st £ Sd that represent partitions (T, T) , this correlation is clearly well-defined for other 
vectors s G Sd for which there is not such a simple interpretation in terms of cuts. Indeed, in 
Section [3] we considered the case that s was an arbitrary vector in Sd, while in the first part 
of Section |4.2| we considered the case that s was the seed set of a single node. In our empirical 
evaluation in Section [5j we will consider both of these cases as well as the case that s encodes the 
correlation with cuts consisting of multiple nodes. 

5 Empirical Evaluation 

In this section, we provide an empirical evaluation of LocalSpectral by illustrating its use at finding 
and evaluating locally-biased low-conductance cuts, i.e., sparse cuts or good clusters, around an 
input seed set of nodes in a data graph. We start with a brief discussion of a very recent and 
pictorially-compelling application of our method to a computer vision problem; and then we 
discuss in detail how our method can be applied to identify clusters and communities in a more 
heterogeneous and more difficult-to- visualize social network application. 

5.1 Semi-Supervised Image Segmentation 

Subsequent to the initial dissemination of the technical report version of this paper, Maji, Vishnoi, 
and Malik [24] applied our methodology to the problem of finding locally-biased cuts in a computer 
vision application. Recall that image segmentation is the problem of partitioning a digital image 
into segments corresponding to significant objects and areas in the image. A standard approach 
consists in converting the image data into a similarity graph over the the pixels and applying a 
graph partitioning algorithm to identify relevant segments. In particular, spectral methods have 
been popular in this area since the work of Shi and Malik [29] , which used the second eigenvector 
of the graph to approximate the so-called normalized cut (which, recall, is an objective measure 
for image segmentation that is practically equivalent to conductance). However, a difficulty in 
applying the normalized cut method is that in many cases global eigenvectors may fail to capture 
important local segments of the image. The reason for this is that they aggressively optimize 
a global objective function and thus they tend to combine multiple segments together; this is 
illustrated pictorially in the first row of Figure |4} 

This difficulty can be overcome in a semi-supervised scenario by using our LocalSpectral 
method. Specifically, one often has a small number of "ground truth" labels that correspond 
to known segments, and one is interested in extracting and refining the segments in which those 
labels reside. In this case, if one considers an input seed corresponding to a small number of 
pixels within a target object, then LocalSpectral can recover the corresponding segment with high 
precision. This is illustrated in the second row of Figure [4j This computer vision application of 
our methodology was motivated by a preliminary version of this paper, and it was described in de- 
tail and evaluated against competing algorithms by Maji, Vishnoi, and Malik [23]. In particular, 
they show that LocalSpectral achieves a performance superior to that of other semi-supervised 
segmentation algorithms |32| lllj : and they also show how LocalSpectral can be incorporated 
in an unsupervised segmentation pipeline by using as input seed distributions obtained by an 
object-detector algorithm [6]. 
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Figure 4: The first row shows the input image and the three smallest eigenvectors of the Laplacian 
of the corresponding similarity graph computed using the intervening contour cue [23]. Note that 
no sweep cut of these eigenvectors reveals the leopard. The second row shows the results of 
LocalSpectral with a setting of 7 = — 10A2(G) with the seed pixels highlighted by crosshairs. Note 
how one can to recover the leopard by using a seed vector representing a set of only 4 pixels. 
In addition, note how the first seed pixel allows us to capture the head of the animal, while the 
other seeds help reveal other parts of its body. 

5.2 Detecting Communities in Social Networks 

Finding local clusters and meaningful locally-biased communities is also of interest in the analysis 
of large social and information networks. A standard approach to finding clusters and communities 
in many network analysis applications is to formalize the idea of a good community with an 
"edge counting" metric such as conductance or modularity and then to use a spectral relaxation 
to optimize it approximately [26, 27J. For many very large social and information networks, 
however, there simply do not exist good large global clusters, but there do exist small meaningful 
local clusters that may be thought of as being nearby prespecified seed sets of nodes [201 EH [22] • 
In these cases, a local version of the global spectral partitioning problem is of interest, as was 
shown by Leskovec, Lang, and Mahoney [22]. Typical networks are very large and, due to their 
expander-like properties, are not easily- visualizable |20| I21| . Thus, in order to illustrate the 
empirical behavior of our LocalSpectral methodology in a "real" network application related to 
the one that motivated this work \20\ [2T1 [22] , we examined a small "coauthorship network" of 
scientists. This network was previously used by Newman [26J to study community structure in 
small social and information networks. 

The corresponding graph G is illustrated in Figure [5] and consists of 379 nodes and 914 
edges, where each node represents an author and each unweighted edge represents a coauthorship 
relationship. The spectral gap \2{G) = 0.0029; and a sweep cut of the eigenvector corresponding 
to this second eigenvalue yields the globally-optimal spectral cut separating the graph into two 
well-balanced partitions, corresponding to the left half and the right half of the network, as shown 
in Figure [5} Our main empirical observations, described in detail in the remainder of this section, 
are the following. 

• First, we show how varying the teleportation parameter allows us to detect low-conductance 
cuts of different volumes that are locally-biased around a prespecified seed vertex; and 
how this information, aggregated over multiple choices of teleportation, can improve our 
understanding of the network structure in the neighborhood of the seed. 
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Figure 5: [Best viewed in color.] The coauthorship network of Newman [26J. This layout was 
obtained in the Pajek [1] visualization software, using the Kamada-Kawai method [16] on each 
component of a partition provided by LocalCut and tiling the layouts at the end. Boxes show the 
two main global components of the network, which are displayed separately in subsequent figures. 

• Second, we demonstrate the more general usefulness of our definition of a generalized Per- 
sonalized PageRank vector (where the 7 parameter in Eqn. ([!]) can be 7 G (— 00, A2 (Cr)) 
by displaying specific instances in which that vector is more effective than the usual Per- 
sonalized PageRank (where only positive teleportation probabilities are allowed and thus 
where 7 must be negative). We do this by detecting a wider range of low-conductance cuts 
at a given volume and by interpolating smoothly between very locally-biased solutions to 
LocalSpectral and the global solution provided by the Spectral program. 

• Third, we demonstrate how our method can find low-conductance cuts that are well- 
correlated to more general input seed vectors by demonstrating an application to the de- 
tection of sparse peripheral regions, e.g., regions of the network that are well-correlated 
with low-degree nodes. This suggests that our method may find applications in leveraging 
feature data, which are often associated with the vertices of a data graph, to find interesting 
and meaningful cuts. 

We emphasize that the goal of this empirical evaluation is to illustrate how our proposed method- 
ology can be applied in real applications; and thus we work with a relatively easy-to-visualize 
example of a small social graph. This will allow us to illustrate how the "knobs" of our proposed 
method can be used in practice. In particular, the goal is not to illustrate that our method or 
heuristic variants of it or other spectral-based methods scale to much larger graphs — this latter 
fact is by now well-established [H [201 EH [22] . 

5.2.1 Algorithm Description and Implementation 

We refer to our cut-finding algorithm, which will be used to guide our empirical study of finding 
and evaluating cuts around an input seed set of nodes and which is a straightforward extension of 
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the algorithm referred to in Theorem [5} as LocalCut. In addition to the graph, the input param- 
eters for LocalCut are a seed vector s (e.g., corresponding to a single vertex v), a teleportation 
parameter 7, and (optionally) a size factor c. Then, LocalCut performs the following steps. 

• First, compute the vector x* of Eqn. ([!]) with seed s and teleportation 7. 

• Second, either perform a sweep of the vector x*, e.g., consider each of the n cuts defined by 
the vector and return the the minimum conductance cut found along the sweep; or consider 
only sweep cuts along the vector x* of volume at most c • £7, where £7 = 1/k 7 , that contain 
the input vertex v, and return the minimum conductance cut among such cuts. 

By Theorem [TJ the vector computed in the first step of LocalCut, x* , is an optimal solution to 
LocalSpectral(G, s, k 7 ) for some choice of k 7 . (Indeed, by fixing the above parameters, the k 
parameter is fixed implicitly.) Then, by Theorem [5j when the vector x* is rounded (to, e.g., 
{— 1, +1}) by performing the sweep cut, provably-good approximations are guaranteed. In addi- 
tion, when the seed vector corresponds to a single vertex v, it follows from Lemma[l]that x* yields 
a lower bound to the conductance of cuts that contain v and have less than a certain volume k-y. 

Although the full sweep-cut rounding does not give a specific guarantee on the volume of 
the output cut, empirically we have found that it is often possible to find small low-conductance 
cuts in the range dictated by k-y. Thus, in our empirical evaluation, we also consider volume- 
constrained sweep cuts (which departs slightly from the theory but can be useful in practice). 
That is, we also introduce a new input parameter, a size factor c > 0, that regulates the maximum 
volume of the sweep cuts considered when s represents a single vertex. In this case, LocalCut does 
not consider all n cuts defined by the vector x*, but instead it considers only sweep cuts of volume 
at most c- £7 that contain the vertex v. (Note that it is a simple consequence of our optimization 
characterization that the optimal vector has sweep cuts of volume at most fc 7 containing v.) This 
new input parameter turns out to be extremely useful in exploring cuts at different sizes, as it 
neglects sweep cuts of low conductance at large volume and allows us to pick out more local cuts 
around the seed vertex. 



In our first two sets of experiments, summarized in Sections 5.2.2 and 5.2.3, we used single- 
vertex seed vectors, and we analyzed the effects of varying the parameters 7 and c, as a function 
of the location of the seed vertex in the input graph. In the last set of experiments, presented 



in Section 5.2.4, we considered more general seed vectors, including both seed vectors that cor- 
respond to multiple nodes, i.e., to cuts or partitions in the graph, as well as seed vectors that do 
not have an obvious interpretation in terms of input cuts. We implemented our code in a combi- 
nation of MATLAB and C++, solving linear systems using the Stabilized Biconjugate Gradient 
Method [31] provided in MATLAB 2006b. On this particular coauthorship network, and on a 
Dell PowerEdge 1950 machine with 2.33 GHz and 16GB of RAM, the algorithm ran in less than 
a few seconds. 



5.2.2 Varying the Teleportation Parameter 

Here, we evaluate the effect of varying the teleportation parameter 7 £ (—00, A2(G)), where recall 
^2(G) = 0.0029. Since it is known that large social and information networks are quite hetero- 
geneous and exhibit a very strong "nested core-periphery" structure [201 |2T] [22] , we perform this 
evaluation by considering the behavior of LocalCut when applied to three types of seed nodes, 
examples of which are the highlighted vertices in Figure [5} These three nodes were chosen to rep- 
resent three different types of nodes seen in larger networks: a periphery-like node, which belongs 
to a lower-degree and less expander-like part of the graph, and which tends to be surrounded by 
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lower-conductance cuts of small volume; a core-like node, which belongs to a denser and higher- 
conductance or more expander-like part of the graph; and an intermediate node, which belongs 
to a regime between the core-like and the periphery-like regions. 

For each of the three representative seed nodes, we executed 1000 runs of LocalCut with c = 2 
and 7 varying by 0.001 increments. Figure [6] displays, for each of these three seeds, a plot of the 
conductance as a function of volume of the cuts found by each run of LocalCut. We refer to this 
type of plot as a local profile plot since it is a specialization of the network community profile 
plot [201 EO E2] to cuts around the specified seed vertex. In addition, Figure [6] also plots several 
other quantities of interest: first, the volume and conductance of the theoretical lower bound 
yielded by each run; second, the volume and conductance of the cuts defined by the shortest-path 
balls (in squares and numbered according to the length of the path) around each seed (which 
should and do provide a sanity-check upper bound); third, next to each of the plots, we present 
a color-coded image of representative cuts detected by LocalCut; and fourth, for each of the cuts 
illustrated on the left, a color-coded triangle and the numerical value of —7 is shown on the right. 

Several points about the behavior of the LocalCut algorithm as a function of the location of 
the input seed node and that are illustrated in Figure [6] are worth emphasizing. 



First, for the core-like node, whose profile plot is shown in Figure 6(a)[ the volume of the 



output cuts grows relatively smoothly as 7 is increased (i.e., as —7 is decreased). For 
small 7, e.g., 7 = —0.0463 or 7 = —0.0207, the output cuts are forced to be small and 
hence display high conductance, as the region around the node is somewhat expander-like. 
By decreasing the teleportation, the conductance progressively decreases, as the rounding 
starts to hit nodes in peripheral regions, whose inclusion only improves conductance (since 
it increases the cut volume without adding many additional cut edges). In this case, this 
phenomena ends at 7 = —0.0013, when a cut of conductance value close to that of the 
global optimum is found. (After that, larger and slightly better conductance cuts can still 
be found, but, as discussed below, they require 7 > 0.) 

Second, a similar interpretation applies to the profile plot of the intermediate node, as 
shown in Figure [6(b)| Here, however, the global component of the network containing the 
seed has smaller volume, around 300, and a very low conductance (again, requiring 7 > 0). 
Thus, the profile plot jumps from this cut to the much larger eigenvector sweep cut, as will 
be discussed below. 

Third, a more extreme case is that of the periphery-like node, whose profile plot is displayed 



in Figure 6(c) In this case, an initial increase in 7 does not yield larger cuts. This vertex is 
contained in a small-volume cut of low conductance, and thus diffusion-based methods get 
"stuck" on the small side of the cut. The only cuts of lower conductance in the network are 
those separating the global components, which can only be accessed when 7 > 0. Hence, 
the teleportation must be greatly decreased before the algorithm starts outputting cuts at 
larger volumes. (As an aside, this behavior is also often seen with so-called "whiskers" in 
much larger social and information networks \20 \ \21 \ 122].) 

In addition, several general points that are illustrated in Figure [6] are worth emphasizing about 
the behavior of our algorithm. 

• First, LocalCut found low-conductance cuts of different volumes around each seed vertex, 
outperforming the shortest-path algorithm (as it should) by a factor of roughly 4 in most 
cases. However, the results of LocalCut still lie away from the lower bound, which is also a 
factor of roughly 4 smaller at most volumes. 
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(a) Selected cuts and profile plot for the core-like node. 




(b) Selected cuts and profiles plot for the intermediate node. 
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(c) Selected cuts and profile plot for the periphery-like node. 



Figure 6: [Best viewed in color.] Selected cuts and local profile plots for varying 7. The cuts on 
the left are displayed by assigning to each vertex a color corresponding to the smallest selected 
cut in which the vertex was included. Smaller cuts are darker, larger cuts are lighter; and the 
seed vertex is shown slightly larger. Each profile plot on the right shows results from 1000 runs of 
LocalCut, with c = 2 and 7 decreasing in 0.001 increments starting at 0.0028. For each color-coded 
triangle, corresponding to a cut on the left, —7 is also listed. 



• Second, consider the range of the teleportation parameter necessary for the LocalCut algo- 
rithm to discover the well-balanced globally-optimal spectral partition. In all three cases, 
it was necessary to make 7 positive (i.e., —7 negative) to detect the well-balanced global 
spectral cut. Importantly, however, the quantitative details depend strongly on whether the 
seed is core-like, intermediate, or periphery-like. That is, by formally allowing "negative 
teleportation" probabilities, which correspond to 7 > 0, the use of generalized Personal- 
ized PageRank vectors as an exploratory tool is much stronger than the usual Personalized 
PageRank [HI2], in that it permits one to find a larger class of clusters, up to and including 
the global partition found by the solution to the global Spectral program. Relatedly, it 
provides a smooth interpolation between Personalized PageRank and the second eigenvec- 
tor of the graph. Indeed, for 7 = 0.0028 « \2(G), LocalCut outputs the same cut as the 
eigenvector sweep cut for all three seeds. 

• Third, recall that, given a teleportation parameter 7, the rounding step selects the cut of 
smallest conductance along the sweep cut of the solution vector. (Alternatively, if volume- 
constrained sweeps are considered, then it selects the cut of smallest conductance among 
sweep cuts of volume at most c • fc 7 , where fc 7 is the lower bound obtained from the opti- 
mization program.) In either case, increasing 7 can lead LocalCut to pick out larger cuts, 
but it does not guarantee this will happen. In particular, due to the local topology of 
the graph, in many instances there may not be a way of slightly increasing the volume of 
a cut while slightly decreasing its conductance. In those cases, LocalCut may output the 
same small sweep cut for a range of teleportation parameters until a much larger, much 
lower-conductance cut is then found. The presence such horizontal and vertical jumps in 
the local profile plot conveys useful information about the structure of the network in the 
neighborhood of the seed at different size scales, illustrating that the practice follows the 
theory quite well. 



5.2.3 Varying the Output-Size Parameter 

Here, we evaluate the effect of varying the size factor c, for a fixed choice of teleportation parameter 
7. (In the previous section, c was fixed at c = 2 and 7 was varied.) We have observed that varying 
c, like varying 7, tends to have the effect of producing low-conductance cuts of different volumes 
around the seed vertex. Moreover, it is possible to obtain low-conductance large-volume cuts, 
even at lower values of the teleportation parameter, by increasing c to a sufficiently large value. 
This is illustrated in Figure [7j which shows the result of varying c with the core-like node as 



the seed and —7 = 0.02. Figure 6(a) illustrated that when c = 2, this setting only yielded a 
cut of volume close to 100 (see the red triangle with —7 = 0.0207); but the yellow crosses in 
Figure [7] illustrate that by allowing larger values of c, better conductance cuts of larger volume 
can be obtained. 

While many of these cuts tend to have conductance slightly worse than the best found by 
varying the teleportation parameter, the observation that cuts of a wide range of volumes can 
be obtained with a single value of 7 leaves open the possibility that there exists a single choice 
of teleportation parameter 7 that produces good low-conductance cuts at all volumes simply 
by varying c. (This would allow us to only solve a single optimization problem and still find 
cuts of different volumes.) To address (and rule out) this possibility, we selected three choices 
of the teleportation parameter for each of the three seed nodes, and then we let c vary. The 
resulting output cuts for the core-like node as the seed are plotted (in blue, green, and yellow) 
in Figure [7| (The plots for the other seeds are similar and are not displayed.) Clearly, no single 
teleportation setting dominates the others: in particular, at volume 200 the lowest-conductance 
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Figure 7: [Best viewed in color.] Selected cuts and local profile plots for varying c with the core- 
like node as the seed. The cuts are displayed by assigning to each vertex a color corresponding 
to the smallest selected cut in which the vertex was included. Smaller cuts are darker, larger 
are lighter. The seed vertex is shown larger. The profile plot shows results from 1000 runs of 
LocalCut, with varying c and —7 G {0,0.01,0.02} . 



cut was produced with —7 = 0.02; at volume 400 it was produced with —7 = 0.01; and at volume 
600 with it was produced with 7 = 0. The highest choice of 7 = performed marginally better 
overall, recording lowest conductance cuts at both small and large volumes. That being said, the 
results of all three settings roughly track each other, and cuts of a wide range of volumes were 
able to be obtained by varying the size parameter c. 

These and other empirical results suggest that the best results are achieved when we vary both 
the teleportation parameter and the size factor. In addition, the use of multiple teleportation 
choices have the side-effect advantage of yielding multiple lower bounds at different volumes. 



5.2.4 Multiple Seeds and Correlation 

Here, we evaluate the behavior of LocalCut on more general seed vectors. We consider two 
examples — for the first example, there is an interpretation as a cut or partition consisting of 
multiple nodes; while the second example does not have any immediate interpretation in terms 
of cuts or partitions. 

In our first example, we consider a seed vector representing a subset of four nodes, located in 
different peripheral branches of the left half of the global partition of the the network: see the 
four slightly larger (and darker) vertices in Figure 8(a)| This is of interest since, depending on 



the size-scale at which one is interested, such sets of nodes can be thought of as either "nearby" 
or "far apart." For example, when viewing the entire graph of 379 nodes, these four nodes are 
all close, in that they are all on the left side of the optimal global spectral partition; but when 
considering smaller clusters such as well-connected sets of 10 or 15 nodes, these four nodes are 



much farther apart. In Figure 8(a), we display a selection of the cuts found by varying the 
teleportation, with c = 2. The smaller cuts tend to contain the branches in which each seed node 
is found, while larger cuts start to incorporate nearby branches. Not shown in the color-coding 
is that the optimal global spectral partition is eventually recovered. Identifying peripheral areas 
that are well-separated from the rest of the graph is a useful primitive in studying the structure 
of social networks |20| [2~H 122] : and thus, this shows how LocalCut may be used in this context, 
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(a) Seed set of four seed nodes. 



(b) A more general seed vector. 



Figure 8: [Best viewed in color.] Multiple seeds and correlation. 8(a) shows selected cuts for 



varying 7 with the seed vector corresponding to a subset of 4 vertices lying in the periphery-like 
region of the network. |8(b) shows selected cuts for varying 7 with the seed vertex equal to a 
normalized version of the degree vector. In both cases, the cuts are displayed by assigning to 
each vertex a color corresponding to the smallest selected cut in which the vertex was included. 
Smaller cuts are darker, larger are lighter. 



when some periphery-like seed nodes of the graph are known. 

In our second example, we consider a seed vector that represents a feature vector on the 
vertices but that does not have an interpretation in terms of cuts. In particular, we consider a 
seed vector that is a normalized version of the degree distribution vector. Since nodes that are 
periphery-like tend to have lower degree than those that are core-like [2Q|. l2Tj [22], this choice 
of seed vector biases LocalCut towards cuts that are well-correlated with periphery-like and low- 
degree vertices. A selection of the cuts found on this seed vector when varying the teleportation 
with c = 2 is displayed in Figure |8(b)| These cuts partition the network naturally into three well- 
separated regions: a sparser periphery-like region in darker colors, a lighter-colored intermediate 
region, and a white dense core-like region, where higher-degree vertices tend to lie. Clearly, this 
approach could be applied more generally to find low-conductance cuts that are well-correlated 
with a known feature of the node vector. 



6 Discussion 

In this final section, we provide a brief discussion of our results in a broader context. 

Relationship to local graph partitioning. Recent theoretical work has focused on using 
spectral ideas to find good clusters nearby an input seed set of nodes [3Q|, [IJ [10]. In particular, 
local graph partitioning — roughly, the problem of finding a low-conductance cut in a graph in time 
depending only on the volume of the output cut — was introduced by Spielman and Teng [30] , 
They used random walk based methods; and they used this as a subroutine to give a nearly 
linear-time algorithm for outputting balanced cuts that match the Cheeger Inequality up to 
polylog factors. In our language, a local graph partitioning algorithm would start a random walk 
at a seed node, truncating the walk after a suitably chosen number of steps, and outputting 
the nodes visited by the walk. This result was improved by Andersen, Chung and Lang [1] 



20 



by performing a truncated Personalized PageRank computation. These and subsequent papers 
building on them were motivated by local graph partitioning [10], but they do not address the 
problem of discovering cuts near general seed vectors, as do we, or of generalizing the second 
eigenvector of the Laplacian. Moreover, these approaches are more operationally-defined, while 
ours is axiomatic and optimization-based. 

Relationship to empirical work on community structure. Recent empirical work has 
used Personalized PageRank, a particular variant of a local random walk, to characterize very 
finely the clustering and community structure in a wide range of very large social and informa- 
tion networks [2j [20] \21\ 122] . In particular, Andersen and Lang used local spectral methods to 
identify communities in certain informatics graphs using an input set of nodes as a seed set [2]. 
Subsequently, Leskovec, Lang, Dasgupta, and Mahoney used related methods to characterize the 
small-scale and large-scale clustering and community structure in a wide range of large social and 
information networks |20l [2T| [22] . Our optimization program and empirical results suggest that 
this line of work can be extended to ask in a theoretically principled manner much more refined 
questions about graph structure near prespecified seed vectors. 

Relationship to cut-improvement algorithms. Many recently-popular algorithms for find- 
ing minimum-conductance cuts, such as those in [17\ I28|. use as a crucial building block a prim- 
itive that takes as input a cut (T, T) and attempts to find a lower-conductance cut that is well 
correlated with (T,T). This primitive is referred to as a cut-improvement algorithm \18\ 13], as 
its original purpose was limited to post-processing cuts output by other algorithms. Recently, 
cut-improvement algorithms have also been used to find low conductance cuts in specific regions 
of large graphs [22] . Given a notion of correlation between cuts, cut-improvement algorithms 
typically produce approximation guarantees of the following form: for any cut (C, C) that is 
e-correlated with the input cut, the cut output by the algorithm has conductance upper-bounded 
by a function of the conductance of (C, C) and e. This line of work has typically used flow-based 
techniques. For example, Gallo, Grigoriadis and Tarjan [12] were the first to show that one can 
find a subset of an input set T O V with minimum conductance in polynomial time. Similarly, 
Lang and Rao [IB] implement a closely related algorithm and demonstrate its effectiveness at 
refining cuts output by other methods. Finally, Andersen and Lang [3] give a more general al- 
gorithm that uses a small number of single-commodity maximum-flows to find low-conductance 
cuts not only inside the input subset T, but among all cuts which are well-correlated with (T, T) . 
Viewed from this perspective, our work may be seen as a spectral analogue of these flow-based 
techniques, since Theorem [6] provides lower bounds on the conductance of other cuts as a function 
of how well-correlated they are with the seed vector. 

Alternate interpretation of our main optimization program. There are a few interesting 
ways to view our local optimization problem of Figure [T] which would like to point out here. Recall 
that LocalSpectral may be interpreted as augmenting the standard spectral optimization program 
with a constraint that the output cut be well-correlated with the input seed set. To understand 
this program from the perspective of the dual, recall that the dual of LocalSpectral is given by 
the following. 

maximize a + (3k 

s.t. L G y aL Kn + /30 T 
/3>0, 
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where Qt = Dgsts^Dq- Alternatively, by subtracting the second constraint of LocalSpectral 
from the first constraint, it follows that 

xT { L K n ~ L Kn s T s^L Kn ) x < 1 - re. 

It can be shown that 

T T T T L K t L Kf 

^K n - L Kn s T s T L Kn = — — =- H — — -, 

voi(i ) vol(i J 

where Lk t is the -D^-weighted complete graph on the vertex set T. Thus, LocalSpectral is clearly 
equivalent to 

rp 

minimize x Lqx 

rp 

s.t. x Lk„x = 1 

Vvoi(r) voi(r) J - 

The dual of this program is given by the following. 

maximize a — (3(1 — re) 

(3 > 0. 

From the perspective of this dual, this can be viewed as "embedding" a combination of a complete 
graph K n and a weighted combination of complete graphs on the sets T and T, i.e., Kt and Kf. 
Depending on the value of /3, the latter terms clearly discourage cuts that substantially cut into 
T or T, thus encouraging partitions that are well-correlated with the input cut (T, T) . 

Bounding the size of the output cut. Readers familiar with the spectral method may recall 
that given a graph with a small balanced cut, it is not possible, in general, to guarantee that 
the sweep cut procedure of Theorem [4] applied to the optimal of Spectral outputs a balanced cut. 
One may have to iterate several times before one gets a balanced cut. Our setting, building up on 
the spectral method, also suffers from this; we cannot hope, in general, to bound the size of the 
output cut (which is a sweep cut) in terms of the correlation parameter re. This was the reason 
for considering volume-constrained sweep cuts in our empirical evaluation. 
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